Approximating Sparse PCA from Incomplete Data

نویسندگان

  • Abhisek Kundu
  • Petros Drineas
  • Malik Magdon-Ismail
چکیده

We study how well one can recover sparse principal components of a data matrix using a sketch formed from a few of its elements. We show that for a wide class of optimization problems, if the sketch is close (in the spectral norm) to the original data matrix, then one can recover a near optimal solution to the optimization problem by using the sketch. In particular, we use this approach to obtain sparse principal components and show that for m data points in n dimensions, O( −2k̃max{m,n}) elements gives an -additive approximation to the sparse PCA problem (k̃ is the stable rank of the data matrix). We demonstrate our algorithms extensively on image, text, biological and financial data. The results show that not only are we able to recover the sparse PCAs from the incomplete data, but by using our sparse sketch, the running time drops by a factor of five or more.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Recovering PCA and Sparse PCA via Hybrid-(l1, l2) Sparse Sampling of Data Elements

This paper addresses how well we can recover a data matrix when only given a few of its elements. We present a randomized algorithm that element-wise sparsifies the data, retaining only a few of its entries. Our new algorithm independently samples the data using probabilities that depend on both squares (`2 sampling) and absolute values (`1 sampling) of the entries. We prove that this hybrid al...

متن کامل

Sparse Kernel Principal Component Analysis

'Kernel' principal component analysis (PCA) is an elegant nonlinear generalisation of the popular linear data analysis method, where a kernel function implicitly defines a nonlinear transformation into a feature space wherein standard PCA is performed. Unfortunately, the technique is not 'sparse', since the components thus obtained are expressed in terms of kernels associated with every trainin...

متن کامل

Greedy Bilateral Sketch, Completion & Smoothing

Recovering a large low-rank matrix from highly corrupted, incomplete or sparse outlier overwhelmed observations is the crux of various intriguing statistical problems. We explore the power of “greedy bilateral (GreB)” paradigm in reducing both time and sample complexities for solving these problems. GreB models a lowrank variable as a bilateral factorization, and updates the left and right fact...

متن کامل

Sparse Statistical Deformation Model for the Analysis of Craniofacial Malformations in the Crouzon Mouse

Crouzon syndrome is characterised by the premature fusion of cranial sutures. Recently the first genetic Crouzon mouse model was generated. In this study, Micro CT skull scannings of wild-type mice and Crouzon mice were investigated. Using nonrigid registration, a wild-type craniofacial mouse atlas was built. The atlas was registered to all mice providing parameters controlling the deformations...

متن کامل

Sparse Additive Matrix Factorization for Robust PCA and Its Generalization

Principal component analysis (PCA) can be regarded as approximating a data matrix with a low-rank one by imposing sparsity on its singular values, and its robust variant further captures sparse noise. In this paper, we extend such sparse matrix learning methods, and propose a novel unified framework called sparse additive matrix factorization (SAMF). SAMF systematically induces various types of...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015